-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add close_timeout option #1926
Add close_timeout option #1926
Conversation
76e5c96
to
a16b773
Compare
@@ -105,6 +106,11 @@ func (h *Harvester) Harvest() { | |||
if !h.sendEvent(event) { | |||
return | |||
} | |||
|
|||
if h.Config.CloseTTL > 0 && time.Since(h.startTime) > h.Config.CloseTTL { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config validation mandates CloseTTL can not be 0 (due to min=0,nonzero
). This on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, because I don't see any use cases for close_ttl = 0, as this would the harvester directly again.
2e5b57f
to
ff5df38
Compare
if h.config.CloseTimeout > 0 && time.Since(startTime) > h.config.CloseTimeout { | ||
logp.Info("Closing harvester because ttl was reached: %s", h.path) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for CloseTimeout at this places requires an event to be published for closeTimeout to function.
If reader is waiting long for a new line or sendEvent blocks, due to spooler being blocked on publisher, close_timeout might be very very late.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file reader has some 'inactive timeout'? The 'inactive timeout' should be <= close_timeout
.
What about multiline timeout? hm...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative implementation could be to implement the timeout in a separate go routine the close h.done
?
@urso I pushed an alternative implementation. Can you have a look? |
jenkins, retest it |
@@ -45,6 +46,7 @@ type harvesterConfig struct { | |||
CloseRemoved bool `config:"close_removed"` | |||
CloseRenamed bool `config:"close_renamed"` | |||
CloseEOF bool `config:"close_eof"` | |||
CloseTimeout time.Duration `config:"close_timeout" validate:"min=0,nonzero"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate
enforces CloseTimeout>=1ns
. This enforces CloseTimeout always enabled. Why disallow 0 or -1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that nonzero should be removed. But -1 doesn't make sense from my point of view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, -1 is kinda similar to 0. CloseTimeout
should only be enabled if >0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is currently the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not. using nonzero
in validate
requires close_timeout > 0
in config files.
b2f8c91
to
913c3f2
Compare
@urso PR updated with adding a note to |
In discussion with @urso we realised there is a potential issue in reader / processor which could lead to and endless loop potentially. Before merging this PR, the reader / processor part should be cleaned up to guarantee proper stopping. |
Even thought implementation for close_timeout is not finished, it already showed up in the docs and config file. As close_timeout still needs more work, these were now removed to prevent any confusion. All changes for close_timeout go into elastic#1926
Even thought implementation for close_timeout is not finished, it already showed up in the docs and config file. As close_timeout still needs more work, these were now removed to prevent any confusion. All changes for close_timeout go into #1926
e58fef6
to
9574e3d
Compare
I redid the implementation and add a Close function to the log_file reader. This implementation has the following issues:
The above implementation is quite simple and works in cases where the output is behaving normally. This is in line with the oder To have some kind of circuit breaker which really stops the harvester independent of the output, I think we also need to make changes to the output to have potentially some feedback loop. Or we not even start new harvesters if there is some congestions (limit number of harvesters ...). It will still not solve the problem, but not open more and more files when output is stuck. |
func (r *LogFile) Close() { | ||
// Make sure reader is only closed once | ||
r.singleClose.Do(func() { | ||
close(r.done) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as the underlying reader might block on syscall, the file must be closed right after closing the channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
which error code is passed up the readers if file is closed by timeout (or done channel)? what happens to buffered (but incomplete) multiline events if file is closed early? |
If the channel is closed, the Error will be Incomplete multiline events will be sent on |
290ea69
to
632cd65
Compare
h.fileReader.Close() | ||
} else { | ||
h.file.Close() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who is calling (*Harvester).close()
? like which go-routine? There a chance of race on h.fileReader being set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same go routine that calls close is also setting the fileReader, so there should not be any race condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm... why not call defer fileReader.Close()
right after creating in said go-routine? fileReader.Close()
already ensures it's doing the magic only once + is thread-safe.
a6f9980
to
ba6934c
Compare
|
||
In case close_timeout is used in combination with multiline events, it can happen that the harvester will be stopped in the middle of a multiline event, means only parts of the event will be sent. In case the harvester is continued at a later stage again and the file still exists, only the second part of the event will be sent. | ||
|
||
Close timeout will not apply, in case your output is stuck and no further events can be sent. At least one event further event must be sent to make close_timeout apply. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least one event further event must be sent to make close_timeout apply.
Can you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@ruflin needs rebase |
close_timeout will end the harvester after the predefined time. In case the output is blocked, close_timeout will only apply on the next event sent. This is identical with the close_* options.
@urso Rebase done |
jenkins, retest it |
close_timeout will end the harvester after the predefined time. In case the output is blocked, close_timeout will only apply on the next event sent. This is identical with the close_* options.